Modeling skewed distributions using multifractals and the ` 80 - 20 law '
نویسندگان
چکیده
The focus of this paper is on the characterization of the skewness of an attribute-value distribution and on the extrapolations for interesting parameters. More speciically, given a vector with the highest h multiplicities ~ m = (m 1 ; m 2vide eeective schemes for obtaining estimates about either its statistics or subsets/supersets of the relation. We assume an 80/20 law, and speciically, a p=(1 ? p) law. This law gives a distribution which is commonly known in the fractals literature as`multifractal'. We show how to estimate p from the given information ((rst few multiplicities, and a few moments), and present the results of our experimentations on real data. Our results demonstrate that schemes based on our multifractal assumption consistently outperform those schemes based Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. on the uniformity assumption, which are commonly used in current DBMSs. Moreover, our schemes can be used to provide estimates for supersets of a relation, which the uniformity assumption based schemes can not not provide at all.
منابع مشابه
Using Weighted Distributions for Modeling Skewed, Multimodal and Truncated Data
When the observations reflect a multimodal, asymmetric or truncated construction or a combination of them, using usual unimodal and symmetric distributions leads to misleading results. Therefore, distributions with ability of modeling skewness, multimodality and truncation have been in the core of interest in statistical literature, always. There are different methods to contract ...
متن کاملModeling skewed distributions using multifractals and the W-20 law’
The focus of this paper is on the characterization of the skewness of an attributevalue distribution and on the extrapolations for interesting parameters. More specifically, given a vector with the highest h multiplicities ci = (rnl,rn2, . . . . mh), and some frequency moments Fp = Crnj, (e.g., q = 0,2), we provide effective schemes for obtaining estimates about either its statistics or subsets...
متن کاملModeling Fractal Structure of City-Size Distributions Using Correlation Functions
Zipf's law is one the most conspicuous empirical facts for cities, however, there is no convincing explanation for the scaling relation between rank and size and its scaling exponent. Using the idea from general fractals and scaling, I propose a dual competition hypothesis of city development to explain the value intervals and the special value, 1, of the power exponent. Zipf's law and Pareto's...
متن کاملModeling skewed distributions using multifractals and the law
The focus of this paper is on the charac terization of the skewness of an attribute value distribution and on the extrapolations for interesting parameters More speci cally given a vector with the highest h multiplicities m m m mh and some frequency moments Fq P mqi e g q we pro vide e ective schemes for obtaining estimates about either its statistics or subsets supersets of the relation We ass...
متن کامل